# README FILE FOR: Air Pollution and Solar Energy: Evidence from Wildfires

## Data Access

All data that are used for this paper are publicly available, either through websites or Google Earth Engine.

The replication package contains the final merged data sets that are used to produce the final tables and figures; raw data (~100GB in size) are not included due to size but are available upon request (email: sk5316@columbia.edu). The final data set formation, which consists of 10 parts, can be found at "R_Step0_DataCollectionAndCleaning.R."

## Description of scripts

### Data Cleaning

`R_Step0_DataCollectionAndCleaning.R'

- Approximate run time: 18-24 hours

- A ten-part code that shows step-by-step process of forming the final data set for analysis
	-Part 1: Rooftop solar generation data: 15-minute to hourly data
		-Raw data source: 15 minute-interval rooftop solar data, from CSI 15-minute interval data (https://www.californiadgstats.ca.gov/downloads/#_csi_15_id)
	-Part 2: Merging hourly rooftop data with CSI Application Data 
		-Raw data source: CSI application data (https://www.californiadgstats.ca.gov/downloads/#_csi_wds)
	-Part 3: Aggregation of rooftop solar data to daily level
	-Part 4: Wildfire smoke data
		-Raw data source: NOAA HMS (https://www.ospo.noaa.gov/Products/land/hms.html#data)
	-Part 5: Weather covariates
		-Raw data source: gridMET, via Google Earth Engine (https://developers.google.com/earth-engine/datasets/catalog/IDAHO_EPSCOR_GRIDMET, see code used for GEE retrieval)
	-Part 6: AOD data
		-Appended to Parts 5 and 6: "Step0_GEECodes.txt," for Google Earth Engine retrieval of weather and AOD data
	-Part 7: Downscaled PM data
		-Raw data source: Reid et al. 2021 (https://figshare.com/articles/dataset/Machine_learning_derived_daily_PM2_5_concentration_estimates_from_by_County_ZIP_code_and_census_tract_in_11_western_states_2008-2018/12568496/1)
	-Part 8: Simulated radiation
		-Raw data source: NSRDB (API access information at https://developer.nrel.gov/docs/solar/nsrdb)
	-Part 9: Fusing all data together
	-Part 10: Data set for AOD-PM2.5 relationship analysis
		-Raw data source: EPA PM2.5 and PM10 measurement data (https://www.epa.gov/outdoor-air-quality-data/download-daily-data)

`R_Step0_MiscallaneousFigures.R'

- Approximate run time: 15 minutes

- Replicates Figures that showcases the used data set: Figure 1 and Appendix Figure F1
	-Note that we used QGIS to collate the CA map on the top left corner for Figure 1(a)

### Data Analysis


`R_Step1_SolarGenerationAndAOD.R'

- Approximate run time: 1 hour

- Replicates our solar generation-AOD results shown in the main text (Tables 2, 3, 5, and 6)

- Also replicates the great majority of Table 1 (summary statistics) - remaining rows of Table 1 can be found in "Step2" codes

- Also produces Appendix Figure F3


`R_Step2_AODandPM2.5.R'

- Approximate run time: 30 minutes

- Replicates our AOD-PM2.5 results shown in main text (Table 3)

- Also replicates Table 1 remaining rows


`R_Step3_PolicyAnalysis.R'

- Approximate run time: 1 hour

- Replicates headline policy analysis results: Figures 2 and 3, Appendix Figures F14-18, headline figures (e.g., $177/ton/yr), Appendix Tables F3 and F4


`R_StepAppendix_SolarGenerationAndAOD.R'

- Approximate run time: 1.5 hour

- Replicates additional analyses of our solar generation-AOD results: Appendix Section B, Appendix Table D1, Appendix Section E, Appendix Figure F2


`R_StepAppendix_AODandPM2.5.R'

- Approximate run time: 1 hour

- Replicates additional analyses of our AOD-PM2.5 results: Appendix Table D2 and its motivations (Appendix Figures F4-F7)

- Replicates additional analyses on Appendix Section C: Apendix Figures C1 to C3


`R_StepAppendix_PolicyAnalysis.R'

- Approximate run time: 30 minutes

- Replicates additional analyses on the shifts of EJScreen variables from 2010 to 2016: Appendix Figures F8 to F13



## Software Information

R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                           LC_TIME=English_United States.utf8    

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] nngeo_0.4.7        openair_2.18-2     xts_0.13.1         data.table_1.14.10 sf_1.0-16          wesanderson_0.3.7 
 [7] zoo_1.8-12         stargazer_5.2.3    scales_1.3.0       paletteer_1.5.0    ggpubr_0.6.0       ggsci_3.0.0       
[13] texreg_1.39.3      lfe_2.9-0          Matrix_1.6-5       binaryLogic_0.3.9  lubridate_1.9.3    forcats_1.0.0     
[19] stringr_1.5.1      dplyr_1.1.4        purrr_1.0.2        readr_2.1.5        tidyr_1.3.0        tibble_3.2.1      
[25] ggplot2_3.4.4      tidyverse_2.0.0   

loaded via a namespace (and not attached):
 [1] DBI_1.2.2           deldir_2.0-2        rematch2_2.1.2      sandwich_3.1-0      rlang_1.1.3        
 [6] magrittr_2.0.3      tseries_0.10-55     e1071_1.7-14        compiler_4.3.0      mgcv_1.8-42        
[11] maps_3.4.2          png_0.1-8           vctrs_0.6.5         quadprog_1.5-8      pkgconfig_2.0.3    
[16] crayon_1.5.2        backports_1.4.1     labeling_0.4.3      utf8_1.2.4          tzdb_0.4.0         
[21] bit_4.0.5           jpeg_0.1-10         terra_1.7-65        broom_1.0.5         parallel_4.3.0     
[26] cluster_2.1.4       R6_2.5.1            stringi_1.8.3       RColorBrewer_1.1-3  car_3.1-2          
[31] lmtest_0.9-40       Rcpp_1.0.12         nnet_7.3-18         timechange_0.2.0    tidyselect_1.2.0   
[36] rstudioapi_0.15.0   abind_1.4-5         timeDate_4032.109   codetools_0.2-19    suncalc_0.5.1      
[41] curl_5.2.0          lattice_0.21-8      quantmod_0.4.25     withr_3.0.0         urca_1.3-3         
[46] units_0.8-5         proxy_0.4-27        pillar_1.9.0        carData_3.0-5       KernSmooth_2.23-20 
[51] generics_0.1.3      TTR_0.24.4          vroom_1.6.5         forecast_8.21.1     hms_1.1.3          
[56] munsell_0.5.1       xtable_1.8-4        class_7.3-21        glue_1.7.0          mapproj_1.2.11     
[61] tools_4.3.0         interp_1.1-6        hexbin_1.28.3       ggsignif_0.6.4      grid_4.3.0         
[66] latticeExtra_0.6-30 colorspace_2.1-0    nlme_3.1-162        fracdiff_1.5-2      Formula_1.2-5      
[71] cli_3.6.2           fansi_1.0.6         gtable_0.3.4        rstatix_0.7.2       prismatic_1.1.1    
[76] classInt_0.4-10     farver_2.1.1        lifecycle_1.0.4     httr_1.4.7          MASS_7.3-58.4      
[81] bit64_4.0.5        
